Back

Molecular Ecology Resources

Wiley

Preprints posted in the last 30 days, ranked by how well they match Molecular Ecology Resources's content profile, based on 161 papers previously published here. The average preprint has a 0.06% match score for this journal, so anything above that is already an above-average fit.

1
A Novel eDNA-Based Approach for Hybrid Detection: Implications for Conservation Management

Sakata, M. K.; Yano, N.; Imamura, A.; Yamanaka, H.; Minamoto, T.

2026-03-27 ecology 10.64898/2026.03.26.714632 medRxiv
Top 0.1%
34.9%
Show abstract

Hybridization between invasive and native species poses a hidden but critical threat to biodiversity. While environmental DNA (eDNA) has revolutionized species monitoring, it has lacked the resolution to detect hybrid individuals. Here, we present the first experimental demonstration of hybrid identification using eDNA. Our method isolates a single cell in the environment (hereafter, eCell) and enables cellular-level analysis using multiplex digital PCR targeting nuclear markers from both parental species. Validation with controlled tank experiments using Oncorhynchus masou masou x Salvelinus leucomaenis leucomaenis hybrid individuals confirmed the methods ability to separately detect hybrid individuals from co-habiting purebred parent individuals. This eCell analysis overcomes the limitations of traditional eDNA methods and offers a scalable, non-invasive tool for detecting cryptic hybridization. By enabling early and accurate detection of hybrid individuals, it supports timely conservation decisions, including management prioritization and the protection of purebred populations. This novel technique bridges a critical gap in conservation genetics and enhances eDNAs utility for biodiversity management in the face of global change.

2
Protocol for genotyping cephalopod sex using a skin swab and quantitative PCR

Montague, T. G.; Rubino, F. A.; Gibbons, C. J.; Mungioli, T. J.; Small, S. T.; Coffing, G. C.; Kern, A. D.

2026-04-02 molecular biology 10.64898/2026.03.31.715692 medRxiv
Top 0.1%
28.2%
Show abstract

The coleoid cephalopods (octopus, cuttlefish, and squid) are emerging model organisms for neuroscience, development, and evolutionary biology. Determining their sex early in life is critical for population management and controlled experiments. Here, we present a protocol to non-invasively determine the sex of multiple cephalopod species as young as 3 hours post-hatching using a skin swab and quantitative PCR (qPCR). We describe steps for designing qPCR primers, swabbing live animals, extracting DNA, running the qPCR, and analyzing the results. For complete details on the use and execution of this protocol, please refer to Rubino et al.1 HighlightsO_LISwab live cephalopods as early as 3 hours post-hatching C_LIO_LIExtract DNA from cephalopod skin swabs C_LIO_LIPerform qPCR-based sex determination C_LIO_LIDesign and validate qPCR primers for new species C_LI Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=190 SRC="FIGDIR/small/715692v1_ufig1.gif" ALT="Figure 1"> View larger version (43K): org.highwire.dtl.DTLVardef@3aa68dorg.highwire.dtl.DTLVardef@8c7e61org.highwire.dtl.DTLVardef@1bd45d9org.highwire.dtl.DTLVardef@134cc4d_HPS_FORMAT_FIGEXP M_FIG C_FIG

3
On-site metabarcoding analysis of environmental DNA samples

Mauvisseau, Q.; Ewer, I.; Blumeris, I.; Iren Bongo, S.; Filipe Brito de Oliveira, L.; Gouvea, B.; Carolina Cei, A.; Ferreira Rodrigues, K.; de Arruda Francisco, J.; Sletteng Garvang, E.; Marena do Rego Henriques, V.; Hurtado Solano, S.; Kvalheim, L.; Kaylynne Lawrence, S.; Ramalho Maciel, B.; Isanda Masaki, H.; Fortunate Mashaphu, M.; Masimula, L.; Prudent Mokgokong, S.; Katrin Onshuus, E.; Lima Paiva, B.; Parker-Allie, F.; Du Plessis, M.; Puzicha, M.; Gabriel Da Silva Solano Reis, O.; Speelman, G.; Moritz Splitthof, W.; Stocco de Lima, A. C.; Strindberg, H.; Smoge Saevik, O.; Tafjord, N. J. D

2026-03-30 ecology 10.64898/2026.03.27.714757 medRxiv
Top 0.1%
18.9%
Show abstract

Environmental DNA metabarcoding is a powerful monitoring tool for assessing aquatic biodiversity, as well as the sustainability and impacts of fisheries and aquaculture. However, conventional laboratory workflows remain time-consuming and dependent on dedicated infrastructures. Here, we present a field trial of a fully portable, off-grid eDNA metabarcoding pipeline that enables end-to-end analysis within a few days using compact equipment, including a BentoLab workstation and an Oxford Nanopore Technologies (ONT) MinION sequencer. The workflow was implemented during two international training courses in Norway and Brazil, where students and early career researchers collected environmental samples, extracted and amplified DNA, prepared DNA libraries, and sequenced on-site before performing bioinformatics and statistical analyses. In the case study detailed here, seven eDNA samples collected and analysed on-site in the Oslofjord allowed detection of 16 fish and elasmobranch species. Although overall diversity was lower than in earlier studies using Illumina-based sequencing, our protocol reliably detected key species and demonstrates that portable eDNA metabarcoding is feasible for rapid ecological assessment, surveillance of high-risk regions and/or deployment in remote or resourcelZllimited settings.

4
Assessing the potential of bee-collected pollen sequence data to train machine learning models for geolocation of sample origin

Hayes, R. A.; Kern, A. D.; Ponisio, L. C.

2026-04-01 bioinformatics 10.64898/2026.03.29.715128 medRxiv
Top 0.1%
14.9%
Show abstract

Pollen is a robust and widespread substance that captures a historical snapshot of a specific time and place, and it can be used to track movements through space by examining the pollen deposited on various objects. Palynology, the study of pollen, is used across fields such as conservation, natural history, and forensics, where it is particularly useful for tracing the origin and movement of objects. However, pollen has remained underutilized due to the difficulty of distinguishing many pollen taxa beyond the family level and limited pollen reference material to support location predictions. With recent developments in pollen DNA metabarcoding these issues have been rectified, but much of the available pollen data are primarily from wind-pollinated species, which are widespread and less informative of specific sample locations. Bee-collected pollen presents an untapped resource in training predictive models to geolocate sample origin. Here we compiled bee-collected pollen DNA sequence relative abundance data from three projects in the western U.S. and assessed the accuracy of supervised machine learning models to predict the location of sample origin based solely on pollen assemblage, without the need of incorporating additional data. Random Forest and k-Nearest Neighbors models yielded high accuracy across all projects. We also found that models trained on taxonomically clustered pollen assigned sequence variants (ASVs) performed slightly better than those trained on raw sequence data, but the difference was minor, indicating that models trained on raw sequence data can reliably predict location and avoid the time-consuming taxonomic assignment process. Our results demonstrate the utility of repurposing bee-collected pollen for geolocation and provide a framework for employing supervised machine learning in future geolocation efforts. HighlightsO_LIBee-collected pollen metabarcoding data was used to accurately predict sample origin C_LIO_LIRandom Forest and k-Nearest Neighbors algorithms were most accurate with lowest error C_LIO_LITaxonomically-classified and raw DNA sequence data training sets performed comparably C_LI

5
Novabrowse: A Tool for High-Resolution Synteny Analysis, Ortholog Detection, and Gene Signal Discovery

Rikk, L.; Ghaffarinia, A.; Leigh, N. D.

2026-03-30 genomics 10.64898/2026.03.27.714371 medRxiv
Top 0.1%
9.8%
Show abstract

Accurate genome annotation remains challenging as assembly quality often exceeds annotation reliability. Resolving ambiguities of gene presence, absence, and orthology typically requires integrating two complementary lines of evidence: sequence homology between species and the conservation of gene order (i.e., synteny). BLAST remains the standard for homology detection, yet its raw output can be difficult to interpret. Existing tools address this challenge but operate at opposing scales. Alignment viewers provide detailed pairwise statistics without genomic context, while synteny tools offer chromosome-scale perspectives without sequence-level resolution. To fill this intermediate gap, we developed Novabrowse, an interactive BLAST results interpretation framework featuring high-resolution multi-species synteny analysis, chromosomal re-arrangement investigation, ortholog detection, and gene signal discovery. Users define a genomic region of interest in a query species and/or use custom sequences, then select one or more subject species for comparison. The pipeline retrieves query gene sequences via NCBI API integration and performs BLAST searches against each subject transcriptome or genome. Results are presented via an interactive HTML file featuring alignment statistics, chromosomal maps, coverage visualizations, ribbon plots, and distance-based clustering of high-scoring segment pairs into putative gene units. We demonstrate these capabilities by investigating Foxp3, Aire, and Rbl1, three highly conserved vertebrate genes, in the recently assembled genome of the newt Pleurodeles waltl. Foxp3 and Aire have not been described in any salamander species to date, despite availability of multiple assemblies and extensive transcriptomic datasets. Using Novabrowse, we discovered conserved loci and gene signals for both genes in P. waltl, the presence of which was subsequently confirmed via Nanopore long-read RNA sequencing. In contrast, Rbl1 analysis uncovered a chromosomal rearrangement at its expected locus with no gene signal detected, indicating a gene loss specific to P. waltl despite the genes retention in the closely related axolotl (Ambystoma mexicanum). Our findings demonstrate Novabrowses capacity for evidence-based evaluation of annotation artifacts, an essential capability as high-quality assemblies become more available for phylogenetically diverse species. Novabrowse is open source (MIT license) and freely available at: https://github.com/RegenImm-Lab/Novabrowse.

6
Genetic Diversity of Cytochrome P450 Genes in Apis mellifera Subspecies

Li, F.; Lima, D.; Bashir, S.; Yadro Garcia, C.; Lopes, A. R.; Verbinnen, G.; de Graaf, D. C.; De Smet, L.; Rodriguez, A.; Rosa-Fontana, A.; Rufino, J.; Martin-Hernandez, R.; Medibees Consortium, ; Pinto, M. A.; Henriques, D.

2026-03-24 genomics 10.64898/2026.03.20.713126 medRxiv
Top 0.1%
8.5%
Show abstract

The western honey bee (Apis mellifera) is an essential pollinator facing unprecedented threats from pesticide exposure. While pesticide resistance evolution is well documented in agricultural pests, our understanding of genetic variation in honey bee detoxification systems remains limited. This represents a missed opportunity, as harnessing naturally occurring detoxification diversity could provide new avenues for pollinator protection. Cytochrome P450 monooxygenases (CYPs), which are central to xenobiotic metabolism, offer a promising starting point. Here, we present the first comprehensive analysis of CYP genetic diversity in A. mellifera. We analysed the CYPome of 1,467 individuals representing 18 A. mellifera subspecies from 25 countries and identified 5,756 single-nucleotide polymorphisms (SNPs) in 46 CYP genes. Imputed McDonald-Kreitman testing revealed that 56% of non-synonymous CYP substitutions were driven by positive selection. Of the 1,302 haplotypes identified, 84% resided in CYP3, concentrated in the CYP9 and CYP6AS subfamilies implicated in xenobiotic detoxification. Population-level analysis of nucleotide diversity, Tajimas D selection signatures, FST-based differentiation, and McDonald-Kreitman testing pointed to CYP3 clan genes as the primary locus of adaptive variation. This work provides the first step toward building a comprehensive pharmacogenomic resource for honey bees, enabling the prediction of population-specific pesticide vulnerabilities and leveraging naturally occurring detoxification variants to enhance pollinator resilience - a critical step toward sustainable pollinator management.

7
A near chromosome-scale genome assembly of the Common pine sawfly (Diprion pini, Linnaeus, 1758)

Wutke, S.; Michell, C.; Lindstedt, C.

2026-03-21 genomics 10.64898/2026.03.19.712881 medRxiv
Top 0.2%
6.1%
Show abstract

The common pine sawfly, Diprion pini, is a widespread defoliator of pine forests across Europe and Asia, with outbreaks causing substantial ecological and economic damages. However, genomic resources for this species have been limited, hindering advances in molecular ecology or pest management. Here, we present a near chromosome-level reference genome for D.pini, generated using PacBio HiFi reads, Oxford Nanopore MionION long reads, and 10x Genomics linked reads. The final assembly is organized into mostly chromosome-sized scaffolds. It spans a length of 268 Mb, comprises 81 scaffolds, and has a scaffold N50 of 18.7 Mb. BUSCO analysis (hymenoptera_odb10) indicates a high genome completeness of 97.2%. With 22,7 kb the mitochondrial genome is unusually large due to an extended non-coding control region (6,874 bp). Gene prediction identified 26,335 protein-coding genes, of which 12,769 were functionally annotated. Comparative analyses with other sawflies and Apocrita identified 2,472 proteins unique to D. pini, some of which are putatively associated with the processing of plant secondary metabolites. Notably, our genome assembly highlights that, when a closely related, high-quality reference genome is available, chromosome-scale assemblies can be generated without the need of Hi-C sequencing. The genome provides a valuable foundation for the development of improved monitoring and management strategies for D. pini outbreaks and contributes to advancing fundamental research on Hymenoptera evolution.

8
Fishing pressure induces changes in DNA methylation in genetically homogeneous marine metapopulations

Barcelo-Serra, M.; Mateman, C.; Pijl, A.; Risse, J.; Sepers, B.; Cortes-Pujol, M. A.; Alos, J.; van Oers, K.

2026-03-19 molecular biology 10.64898/2026.03.19.712898 medRxiv
Top 0.2%
5.2%
Show abstract

Trait-selective harvesting by fisheries can impose strong selective pressures on fish populations, driving changes in life history traits affecting fisheries productivity and ecosystem functioning. While the genetic consequences of harvesting have been extensively studied, the extent to which phenotypic variation reflects genomic evolution versus environmentally-induced plasticity remains unclear. Epigenetic mechanisms, such as DNA methylation, may mediate between these processes, serving as a rapid and reversible response to the selective pressures imposed by harvesting. In this study, we implemented an improved laboratory and bioinformatics protocol, epiGBS3, to examine genomic variation and DNA methylation patterns in the marine fish Xyrichtys novacula. The study spanned three replicated geographical areas each comprising two adjacent locations: an intensively exploited fishery and a no-take Marine Protected Area (ntMPA). A nested analysis design across the three areas revealed strong gene flow and no evidence of genetic structure. Nevertheless, nucleotide diversity was significantly reduced in fisheries relative to ntMPAs. We also found that DNA methylation levels differed between protected and exploited sites after controlling for age, suggesting that fishing may influence epigenetic changes independently of fisheries-induced age-truncation effects. This represents one of the first lines of evidence that fisheries can potentially shape epigenetic variation, supporting DNA methylation as contributor to local adaptation under high gene flow and strong anthropogenic selection.

9
Species-specific versus community-wide assays in eDNA monitoring of European eel Anguilla anguilla: Trade-offs between detection sensitivity and the value of additional community data

Monaghan, A. I. T.; Sellers, G. S.; Griffiths, N. P.; Lawson Handley, L.; Hänfling, B.; Macarthur, J. A.; Wright, R. M.; Bolland, J. D.

2026-03-20 ecology 10.64898/2026.03.19.712641 medRxiv
Top 0.2%
4.8%
Show abstract

Effective monitoring of the critically endangered European eel (Anguilla anguilla) is essential for conservation planning and regulatory decision-making, particularly in heavily fragmented rivers. Environmental DNA (eDNA) methods offer sensitive alternatives to traditional surveys, but there is uncertainty around whether targeted assays or community-wide approaches are better suited to achieve monitoring objectives. We compared eDNA metabarcoding and species-specific quantitative PCR (qPCR) for detecting A. anguilla across 145 pumped catchments in the Fens, East Anglia, England. All sites were sampled once initially, and sites negative for A. anguilla were re-sampled based on metabarcoding results. This allowed comparison of detection rates from a single water sample and site-level retrospective identification of sites where qPCR could have identified A. anguilla in earlier samples. The findings were also set in the context of the wider biodiversity information generated by metabarcoding. From the initial (single) water sample, qPCR detected A. anguilla at seven more sites than metabarcoding (17 versus 10). With repeated sampling, metabarcoding detected A. anguilla at 43 sites, including all but one of the sites where qPCR detected A. anguilla, and ten sites where qPCR did not detect A. anguilla within the same number of samples. Indeed, the additional sampling effort required to detect A. anguilla with metabarcoding at sites also positive with qPCR was small relative to the overall sampling effort. Furthermore, metabarcoding additionally detected 28 non-target fish species alongside fish, amphibian and mammal species of conservation concern. Our results highlight trade-offs between target-species sensitivity and the broader ecological information provided by each method, and support metabarcoding as an effective tool for a holistic conservation approach, with the additional community data outweighing the marginally increased sensitivity of qPCR.

10
Accurate estimation of canine inbreeding using ultra low-coverage whole genomesequencing

Pellegrini, M.; Kim, R.; Rubbi, L.; Kislik, G.; Smith, D.

2026-04-07 bioinformatics 10.64898/2026.04.04.716453 medRxiv
Top 0.3%
4.0%
Show abstract

The measurement of inbreeding has gained significance across diverse fields, including population and conservation genetics, agricultural genetics, breeding programs for animals and plants, and wildlife management. This is due to the fact that inbreeding leads to increased homozygosity and results in lower genetic diversity, rendering populations more vulnerable to environmental changes, diseases, and other stressors. High or mid-coverage whole genome sequencing (WGS) has been widely used for inbreeding estimation, but it is resource-intensive. We aimed to investigate the use of ultra low-coverage whole genome sequencing (ulcWGS) as a cost-effective alternative for inbreeding analysis. Domestic dogs were used for our study as their extensive breeding histories lead to populations with a wide range of inbreeding levels. We constructed a multi-breed reference panel from high-coverage WGS samples. Inbreeding in independent ulcWGS samples was then estimated using runs of homozygosity (RoH) and inbreeding coefficients (F). We modeled the relationship between these measures and sequencing depth using nonlinear regression, to generate inbreeding estimates relative to sequencing depth. Resulting relative RoH and F measurements were significantly correlated, with purebred dogs exhibiting more runs of homozygosity and higher inbreeding coefficients compared to mixed-breed dogs. Our findings demonstrate that ulcWGS can provide reliable and economical estimations of inbreeding, expanding accessibility to genetic monitoring.

11
Reference genomes of four miniature and non-miniature cypriniform fishes inhabiting acidic peat-swamp forest blackwaters of Southeast Asia

Sudasinghe, H.; Liu, Z.; Triginer-Llabres, L.; Hui Tan, H.; Britz, R.; Salzburger, W.; Peichel, C.; Rueber, L.

2026-03-24 genomics 10.64898/2026.03.21.713365 medRxiv
Top 0.3%
3.9%
Show abstract

The acidic blackwaters of Southeast Asias peat-swamp forests represent some of the most extreme freshwater environments on Earth. Despite their very low pH values, limited nutrients, and hypoxic conditions, these blackwater habitats harbor a remarkable diversity of freshwater fishes, including multiple lineages that have independently adapted to these extreme conditions and, in some cases, exhibiting extreme body miniaturization. These replicate evolutionary lineages therefore provide a powerful comparative framework to investigate adaptation to extreme environments and the genomic basis of miniaturization. Here, we present high-quality, annotated reference genomes for four cypriniform species endemic to these peat-swamp forest ecosystems: Paedocypris sp., Sundadanio atomus, Boraras brigittae, and Rasbora kalochroma. The first two are progenetic miniatures, including Paedocypris, comprising the smallest known fish, while B. brigittae represents a proportioned dwarf and R. kalochroma a non-miniature taxon. Genome sizes ranged from 401-1,290 Mb and heterozygosity from 0.34-1.7%. All genome assemblies achieved pseudo-chromosome-level contiguity, high k-mer completeness (>99%), and high BUSCO completeness (94.5-98.9%). Repeat analyses revealed lineage-specific differences in transposable element landscapes and abundances, while gene annotation identified notable intron length reduction in progenetic miniatures.

12
Enrichment Probe Sets Combining Universal and Lineage-Specific Targets Help Resolve Recalcitrant Lineages

Villa-Machio, I.; Masa-Iranzo, I.; Nürk, N. M.; Pokorny, L.; Meseguer, A. S.

2026-03-25 evolutionary biology 10.64898/2026.03.24.713849 medRxiv
Top 0.3%
3.9%
Show abstract

The combination of target capture sequencing (TCS) with low-coverage whole genome sequencing (lcWGS), an approach known as Hyb-Seq, has allowed the integration of natural history collections into the genomics revolution, transforming biodiversity research. To implement Hyb-Seq, a collection of genomic targets, often nuclear orthologs, is needed to design probes for TCS. In flowering plants, the universal Angiosperms353 probe set has been proven resolutive at multiple evolutionary scales, with caveats. Malpighiales is known to be one of the most challenging flowering plant orders to resolve. Within this order, the clusioid clade ([~]2.2K species, 94 genera, five families) is no exception. To resolve phylogenetic relationships in this recalcitrant clade, we design a custom probe set, the Clusioids626 kit, composed of 39,936 120-mer probes targeting 626 nuclear orthologs ([~]6.6M nucleotides). This probe set includes all Angiosperms353 targets and 273 clusioid-specific ones, carefully chosen taking copy-number, length evenness, and phylo-informativeness into account. We test our probe set on 70 accessions representing all families and tribes in the clusioid clade. On average, 50.4% of TCS reads mapped to our targets, recovering a median of [~]600 orthologs. Relationships for all clusioid families are fully resolved for our nuclear targets. Additionally, 105 plastid coding DNA sequences were retrieved from the lcWGS fraction. A strong cyto-nuclear conflict was detected. The Clusioids626 kit performs better than the universal Angiosperms353 enrichment panel alone. Our kit design workflow can be extended into other lineages for which a universal probe set exists but more resolution is needed.

13
Inference of population demographic history captures differing evolutionary signals based on the number of individuals in the dataset

Mah, J. C.; Lohmueller, K. E.

2026-04-08 evolutionary biology 10.64898/2026.04.07.716740 medRxiv
Top 0.4%
3.0%
Show abstract

Accurate estimation of population demographic history is central to population genetics yet remains challenging due to the sensitivity of inference methods to the number of individuals and the demographic scenario assumed in inference. The site-frequency spectrum (SFS) of neutral variants, a widely used summary statistic of genetic variation, is particularly sensitive to demographic processes, but studies have shown that qualitative results from demographic inference, i.e., population expansion vs. contraction, can depend strongly on the number of individuals in the dataset. Here, we analyzed two simulated datasets and one empirical dataset characterized by an ancient population bottleneck followed by a recent population expansion. Fitting a two-epoch demographic model across a range of sample sizes, we found that inference shifted from signals of ancient population contraction at small sample sizes to signals of recent population expansion at large sample sizes. Other summary statistics, including Tajimas D and the proportion of singletons, also changed with sample size. We found that these changes of inferred evolutionary signals under a two-epoch model can be explained by the epoch which contributes the highest mean proportion of coalescent branch lengths. Our results highlight that demographic inference depends critically on the number of individuals analyzed and suggest that analyzing datasets at multiple sample sizes can reveal complementary aspects of population history.

14
Machine Learning-Enhanced Nanopore ITS Analysis: Evaluating CPU-GPU Pipelines for High-Accuracy Fungal Taxonomic Resolution

Albuja, D. S.; Maldonado, P. S.; Zambrano, P. E.; Olmos, J. R.; Vera, E. R.

2026-04-07 bioinformatics 10.64898/2026.04.06.716835 medRxiv
Top 0.4%
2.8%
Show abstract

Accurate fungal species identification is critical for microbial ecology, food safety, and plant pathology. However, morphological limitations and genomic complexity hinder this process. Molecular markers such as the ITS region, along with Oxford Nanopore long-read sequencing, offer a robust solution, albeit limited by error rates in homopolymeric regions and a high dependence on advanced computational resources (GPUs) to achieve high accuracy. This study benchmarks two bioinformatics workflows on a multiplexed dataset of complex fungal communities to address this technological gap: a CPU-based workflow optimized using a Bayesian machine learning engine and a GPU-accelerated workflow incorporating "super high accuracy" (SUP) models and refinement with neural networks. The results establish a scalable framework for evaluating the impact of computational architecture on final taxonomic resolution. It is demonstrated that GPU processing maximizes data retention and species-level accuracy by correcting systematic errors. Alternately, implementing automated hyperparameter optimization in CPU environments stabilizes sequence clustering and achieves high taxonomic concordance at the genus level. This conceptual advance validates the feasibility of performing ITS metabarcoding analysis in resource-constrained infrastructures, thus providing the scientific community with a reproducible protocol that balances the need for taxonomic precision with hardware availability.

15
Feeding ecology and ecological risks of the invasive fish Coreoperca herzi revealed by gut content DNA and environmental DNA metabarcoding

Tsuji, S.; Hibino, Y.; Morimoto, S.; Miuchi, Y.; Watanabe, K.

2026-03-24 ecology 10.64898/2026.03.20.713311 medRxiv
Top 0.4%
2.8%
Show abstract

Understanding the dietary patterns of introduced predators is essential for assessing their impacts on freshwater ecosystems. Here, we investigated the feeding ecology of the invasive Korean perch (Coreoperca herzi) introduced to the Oyodo River system, Japan, by integrating gut content DNA metabarcoding and environmental DNA (eDNA) metabarcoding. Fifty specimens were collected, and prey taxa were identified using metabarcoding targeting fish, aquatic insects, and crustaceans. In parallel, eDNA metabarcoding of habitat water samples was used to assess prey availability and selectivity. The results revealed that the Korean perch prey extensively on aquatic insects and fish. Aquatic insect prey were dominated by epilithic clinger taxa inhabiting stone surfaces, particularly mayflies, suggesting visual-mediated prey selection. Fish predation was frequently detected even in small individuals (<100 mm SL), in contrast to previous studies based on conventional methods, indicating that piscivory begins early and ontogenetic dietary shifts are not pronounced. Furthermore, quantitative fish eDNA analysis showed a positive relationship between eDNA concentrations of prey species and predation frequency, indicating opportunistic feeding on abundant, size-accessible prey. By applying two metabarcoding approaches, this study provides an integrated assessment of prey utilisation and environmental context, highlighting ecological risks posed by the Korean perch to freshwater communities in Japan.

16
Development of the 4TREE SNP array, a forest multispecies array to enhance European Breeding and conservation programs in pine, poplar and ash.

Guilbaud, R.; Bagnoli, F.; Ben-Sadoun, S.; Biselli, C.; Buret, C.; Buiteveld, J.; Cativelli, L.; Copini, P.; Drouaud, J.; Esselink, D.; Fricano, A.; Benoit, V.; Kelly, L. J.; Kodde, L.; Metheringham, C. L.; Pinosio, S.; Rogier, O.; Segura, V.; Spanu, I.; Tumino, G.; Buggs, R. J.; Gonzalez-Martinez, S. C.; Vietto, L.; Nervo, G.; Jorge, V.; Dowkiw, A.; Smulders, M. J.; Sanchez, L.; Vendramin, G. G.; Bastien, C.; Faivre Rampant, P.

2026-03-23 genomics 10.64898/2026.03.21.711309 medRxiv
Top 0.4%
2.7%
Show abstract

Within the framework of the European Adaptive BREEDING for Better FORESTs project (B4EST, https://b4est.eu/), we have developed genotyping tools for Poplar, Ash, and Pine forest tree species. SNP arrays are attractive genotyping tools because of the user-friendly genotype calling system and the robust transferability among laboratories. Here we describe the development of an Axiom SNP array for Pinus pinaster (13,407 SNPs), Pinus pinea (5,671 SNPs), Poplar spp. (13,408 SNPs), and Fraxinus spp. (13,407 SNPs) based on a two-step process. We first assembled a high-density (>100,000 SNPs/species) screening array that served to test a large panel of candidate SNPs on a diversity panel involving at least 120 individual trees per species or species group. In the second step, we selected and combined the most informative SNPs to build the final 50,000 SNP 4TREE array. This approach resulted in high genotyping success rates, including for species lacking previously validated high-quality SNP resources. The 4TREE SNP array provides a valuable and transferable genomic tool to support genomic prediction, breeding, and adaptive management of forest tree species.

17
Robust Random Forests for Genomic Prediction: Challenges and Remedies

Lourenco, V. M.; Ogutu, J. O.; Piepho, H.-P.

2026-04-01 bioinformatics 10.64898/2026.03.30.715203 medRxiv
Top 0.4%
2.6%
Show abstract

Data contamination--from recording errors to extreme outliers--can compromise statistical models by biasing predictions, inflating prediction errors, and, in severe cases, destabilizing performance in high-dimensional settings. Although contamination can affect responses and covariates, we focus on response contamination and evaluate Random Forests through simulation. Using a synthetic animal-breeding dataset, we assess robust Random Forests across several contamination scenarios and validate them on plant and animal datasets. We thereby clarify the consequences of contamination for prediction, develop a robust Random Forest framework, and evaluate its performance. We examine preprocessing or data-transformation strategies, algorithmic modifications, and hybrid approaches for robustifying Random Forests. Across these approaches, data transformation emerges as the most effective strategy, delivering the strongest performance under contamination. This strategy is simple, general, and transferable to other Machine Learning methods, offering a remedy for robust genomic prediction. In real breeding data, robust Random Forests are useful when substantial contamination, phenotypic corruption, misrecording, or train-deployment mismatch is plausible and the goal is to recover a latent signal for genomic prediction and selection; ranking-based robust Random Forests are the dependable first option, whereas weighting-based Random Forests should be used only when their weighting scheme preserves rank structure and improves prediction. Robustification is not universally necessary, but it becomes important when contamination distorts the link between observed responses and the predictive target; standard Random Forests remain the default for clean data, whereas robust Random Forests should be fitted alongside them whenever contamination is plausible, with the final choice guided by data, trait, and breeding objective. Author summaryMachine learning (ML) methods are widely used for prediction with high-dimensional, complex data, and supervised approaches such as Random Forests (RF) have proved effective for genomic prediction (GP) and selection. Yet their performance can be severely compromised by data contamination if the algorithms rely on classical data-driven procedures that are sensitive to atypical observations. Robustifying ML methods is therefore important both for improving predictive performance under contamination and for guiding their practical use in high-dimensional prediction problems. To address this need, we develop robust preprocessing, algorithm-level, and hybrid strategies for improving RF performance with contaminated data. Using simulated animal data, we show that ranking-and weighting-based robust RF provide the strongest overall compromise for genomic prediction and selection under contamination. Validation on several plant and animal breeding datasets further shows that the benefits of robustification are not universal, but depend on the dataset, trait, and breeding objective. Although motivated by RF, the framework we propose is general, practical, and readily transferable to other ML methods. It also offers a basis for deciding when robustness should complement standard RF rather than replace it outright.

18
Whole-genome pre-amplification as a viable approach for genomic screening of FFPE-derived DNA samples

Guerrero Quiles, C.; Lodhi, T.; Sellers, R.; Sahoo, S.; Weightman, J.; Breitwieser, W.; Sanchez Martinez, D.; Bartak, M.; Shamim, A.; Lyons, S.; Reeves, K.; Reed, R.; Hoskin, P.; West, C.; Forker, L.; Smith, T.; Bristow, R.; Wedge, D. C.; Choudhury, A.; Biolatti, L. V.

2026-03-29 molecular biology 10.64898/2026.03.26.714414 medRxiv
Top 0.5%
2.1%
Show abstract

Whole-genome sequencing (WGS) enables comprehensive analysis of tumour genomes, but its use in formalin-fixed paraffin-embedded (FFPE) samples is limited by DNA fragmentation and low yields. Whole-genome amplification (WGA) methods such as multiple displacement amplification (MDA) can boost DNA availability but distort copy-number alteration (CNA) profiles. DNA ligation-mediated MDA (DLMDA) mitigates this bias by reconstituting fragmented templates, yet its performance in FFPE-derived DNA remains uncertain. We compared paired DLMDA pre-amplified (2h, 8h) and non-pre-amplified FFPE prostate tumour samples from 22 archival blocks (5, 15 and 20 years old). DLMDA increased DNA yield by 42- to 86-fold, with global CNA patterns largely preserved. However, DLMDA significantly reduced the number of detected CNA deletions and amplifications. These effects were independent of both block age and reaction time. CNA dropouts were randomly distributed across the genome, indicating that DLMDA does not introduce regional bias. Our results show that DLMDA enables robust DNA yield recovery and avoids false-positive CNA artefacts, but at the cost of reduced CNA sensitivity. While suitable for CNA screening pipelines through WGS, further improvements are required to minimise the false-negative risk and improve the techniques sensitivity for FFPE-based genomics.

19
Radiographic assessment of bone maturation as a tool for age estimation in common dolphins (Delphinus delphis)

Hanninger, E.-M. F. F.; Barratclough, A.; Betty, E. L.; Anderson, M. J.; Perrott, M. R.; Bowler, J.; Palmer, E. I.; Peters, K. J.; Stockin, K. A.

2026-04-07 zoology 10.64898/2026.04.05.716530 medRxiv
Top 0.5%
2.1%
Show abstract

We present the first radiographic ageing framework for common dolphins (Delphinus delphis), based on ossification and epiphyseal fusion patterns in the pectoral flipper, demonstrating higher reliability for chronological age estimation than currently available epigenetic approaches for this species. Using individuals of known dental age, we calibrated two modelling approaches to predict dental age from radiographic bone scores: 1) a univariate polynomial regression using a total bone score (sum of 16 scores across all assessed flipper bones), and 2) a multivariate canonical analysis of principal coordinates (CAP) incorporating 16 individual bone-score variables. Both approaches successfully predicted dental age from skeletal ossification patterns. For an age range of 0 to 24 years, polynomial regression demonstrated high predictive accuracy with median absolute errors (MAEs) of 1.25 years in females (Spearmans {rho} = 0.93, R{superscript 2} = 0.90) and 1.08 years in males ({rho} = 0.95, R{superscript 2} = 0.86). The CAP model yielded MAEs of 1.35 years in females ({rho} = 0.90, R{superscript 2} = 0.85) and 1.80 years in males ({rho} = 0.94, R{superscript 2} = 0.84). Notably, both radiographic bone ageing models achieved equal or lower median absolute errors and higher coefficients of determination than a recently developed epigenetic clock for common dolphins derived from the same population (MAE = 1.80, Pearsons correlation (r) = 0.91, R{superscript 2} = 0.82). When applying the bone ageing models to individuals of unknown dental age, both models produced age estimates consistent with expected life-history stages (foetus, neonate, juvenile, subadult, adult), although accuracy declined in dolphins above 20 years, likely as a consequence of subtle age-related variation in skeletal changes in this species. Radiographic ageing provides an accurate non-invasive tool for demographic assessment to support conservation management of common dolphins.

20
Barcode Crosstalk in ONT Multiplex Sequencing: Quantification and Mitigation Strategies

Scharf, S. A.; Spohr, P.; Ried, M. J.; Haas, R.; Klau, G. W.; Henrich, B.; Pfeffer, K.

2026-03-28 molecular biology 10.64898/2026.03.27.714689 medRxiv
Top 0.5%
2.1%
Show abstract

Multiplexing samples in long-read sequencing with Oxford Nanopore Next Generation Sequencing Technology (ONT) by ligating specific native barcodes to individual DNA samples enables significant increases of high throughput sequencing combined with a significant reduction of sequencing costs. However, this advantage carries the risk of barcode misassignment / crosstalk. Employing ONT multiplex sequencing with samples, we observed misassigned barcodes so called barcode crosstalk, after ONT library preparation according to the standard protocol, particularly in samples with low input DNA concentrations. We assumed that these barcode misassignments are largely due to misligation of remaining native barcodes during subsequent the subsequent sequencing adapter ligation. To systematically investigate and quantify barcode crosstalk, genomic DNA (gDNA) from four bacterial type strains with different DNA input concentrations was prepared using three protocols for library preparation: the Nanopore standard protocol (protocol A: version valid until July 2, 2025) the new Nanopore protocol (protocol B: version from July 2, 2025), and an in house protocol with pooling of the barcoded samples only after the sequencing adapter ligation step (protocol C: in house). All samples were sequenced on a Nanopore PromethIon device. The results clearly showed that the use of protocol A resulted in a pronounced barcode crosstalk especially detectable in samples with low DNA input concentrations (up to 2.4% misassigned reads). The ONT adjustment in protocol B (altered washing buffer vs. protocol A) significantly alleviated the barcode crosstalk to below 0.01%, whereas protocol C eliminated barcode crosstalk virtually completely. These observations emphasize that sequencing results obtained with older ONT native barcoding protocol variants should be critically reviewed. The newer ONT barcoding protocol is preferable for sequencing, but it does not completely eliminate the barcode crosstalk effect. In conclusion, for low DNA input and high accuracy sequencing, protocol C is recommended.